An energy normalization scheme for improved robustness in speech recognition
نویسندگان
چکیده
The log energy parameter has long been used as an extension to the basic cepstral feature vector in speech recognition. The use of a normalization technique for the log energy parameter has also been widely accepted. In this paper, a simple energy normalization scheme is introduced that allows direct use of the frame energy parameter in speech recognition and performs well in the presence of noise. Its combination with traditional cepstral mean and variance normalizations has led to error rate improvements of up to 55% on the Aurora 2 task, in comparison to the baseline clean-trained system using feature set including the log energy parameter. This achievement has been obtained with neither complicated programming nor computation expensive routines. The performance of this scheme on an utterance-wide basis has been close to that of the off-line speaker-wide normalization, which makes it a good candidate for practical systems.
منابع مشابه
強健性語音辨識中能量相關特徵之改良式正規化技術的研究 (Study of the Improved Normalization Techniques of Energy-Related Features for Robust Speech Recognition) [In Chinese]
The rapid development of speech processing techniques has made themselves successfully applied in more and more applications, such as automatic dialing, voice-based information retrieval, and identity authentication. However, some unexpected variations in speech signals deteriorate the performance of a speech processing system, and thus relatively limit its application range. Among these variat...
متن کاملSilence feature normalization for robust speech recognition in additive noise environments
In this paper, we propose a simple yet very effective feature compensation scheme for two energy-related features, the logarithmic energy (logE) and the zeroth cepstral coefficient (c0), in order to improve their noise robustness. This compensation scheme, named silence feature normalization (SFN), uses the high-pass filtered features as the indicator for speech/non-speech classification, and t...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملNonlinear Spectral Transformations for Robust Speech Recognition
Recently, a nonlinear transformation of autocorrelation coefficients named Phase AutoCorrelation (PAC) coefficients has been considered for feature extraction [1]. PAC based features show improved robustness to additive noise as a result of two operations, performed during the computation of PAC, namely energy normalization and inverse cosine transformation. In spite of the improved robustness ...
متن کاملEnhancing the Magnitude Spectrum of Speech Features for Robust Speech Recognition
In this article, we present an effective compensation scheme to improve noise robustness for the spectra of speech signals. In this compensation scheme, called magnitude spectrum enhancement (MSE), a voice activity detection (VAD) process is performed on the frame sequence of the utterance. The magnitude spectra of non-speech frames are then reduced while those of speech frames are amplified. I...
متن کامل